-
Notifications
You must be signed in to change notification settings - Fork 31
refactor(swtbench): regroup all hyperparameters in constants.py #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit creates a single source of truth for all SWTBench hyperparameters and constant values by: 1. Creating benchmarks/swtbench/constants.py with all constants: - Docker/Image related constants (prefixes, registries, build targets) - Dataset related constants (default dataset, split, model name) - Environment variable names - Default values for various parameters - File/directory paths (repo dir, evaluation results dir, report filename) - Git/repository related constants - Patch processing constants - Environment setup commands 2. Updating all swtbench modules to import from constants.py: - run_infer.py - eval_infer.py - build_eval_env_images.py - image_utils.py This makes it easier to check and modify hyperparameters for the benchmark as they are now centralized in one location. Fixes #364 Co-authored-by: openhands <openhands@all-hands.dev>
e275ab3 to
328490e
Compare
…ility
- Add typing.Final annotations to all constants for type safety
- Convert mutable lists to immutable tuples:
- SETUP_FILES_TO_REMOVE
- BUILD_MODE_CHOICES
- DEFAULT_ENV_SETUP_COMMANDS
- Add BuildMode enum for type-safe build mode selection
- Convert string numeric constants to proper int types:
- DEFAULT_STARTUP_TIMEOUT: '600' -> 600
- DEFAULT_EVAL_WORKERS: '12' -> 12
- Update callers to handle type changes:
- eval_infer.py: workers parameter now int, add type=int to argparse
- run_infer.py: convert tuple to list for env_setup_commands,
convert int to str for os.getenv default
Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
- Add TargetType alias to constants.py matching openhands.sdk.workspace.TargetType - Update DEFAULT_BUILD_TARGET to use TargetType instead of str - Update run_infer.py to use TargetType for build_target variable and function parameter Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
🔴 Code Review: PR #372 - SWTBench Constants RefactorTaste Rating: 🟡 Acceptable - Works but has one behavioral change that needs attention[CRITICAL ISSUES] (Must fix - these break fundamental principles)[benchmarks/swtbench/eval_infer.py, Lines 246-248] # Original:
def run_swtbench_evaluation(..., workers: str = "12") -> None:
# New:
def run_swtbench_evaluation(..., workers: int = DEFAULT_EVAL_WORKERS) -> None:While this works at runtime (Python is dynamically typed), it's a function signature change that could break type-checked callers. The argparse also changed from accepting a string to Recommendation: If this is intentional (improving type correctness), document it in the PR description as a breaking change. Otherwise, keep [IMPROVEMENT OPPORTUNITIES] (Should fix - violates good taste)[benchmarks/swtbench/constants.py, Line 35] 🔍 Magic Number: [benchmarks/swtbench/constants.py, Lines 38-49] 🔧 Inconsistent Enum Usage: You define DEFAULT_BUILD_MODE: Final[str] = BuildMode.CLI.value # Why not just BuildMode.CLI?
BUILD_MODE_CHOICES: Final[Tuple[str, ...]] = tuple(m.value for m in BuildMode)Consider using the enum directly and converting to string only at the argparse boundary. [benchmarks/swtbench/constants.py, Line 13] 🔧 Inconsistent Type Definition: [STYLE NOTES] (Minor observations)[benchmarks/swtbench/constants.py] 📝 The heavy use of section separators ( [benchmarks/swtbench/eval_infer.py, Line 214] ✅ VERDICT:❌ Needs minor rework: The KEY INSIGHT:The refactoring successfully centralizes constants without changing their values, but inadvertently "improves" the Suggested Next Steps for Cleaner Code:
|
Remove the BuildMode enum class and replace with simple string constants. The enum was only used to generate choices and default values, which can be done more simply with a tuple and string constant. Co-authored-by: openhands <openhands@all-hands.dev>
Summary
This PR creates a single source of truth for all SWTBench hyperparameters and constant values by introducing
benchmarks/swtbench/constants.py.Fixes #364
Changes
1. Created
benchmarks/swtbench/constants.pyThis new file contains all constant values organized into logical categories:
TargetTypeliteral type alias for build targetsBuildModeenum with API and CLI options2. Updated all swtbench modules to import from constants.py
run_infer.pyeval_infer.pybuild_eval_env_images.pyimage_utils.py3. Type Safety Improvements
typing.Finalannotations for immutabilitySETUP_FILES_TO_REMOVEBUILD_MODE_CHOICESDEFAULT_ENV_SETUP_COMMANDSBuildModeenum for type-safe build mode selectionTargetTypeliteral type for build targets (matchesopenhands.sdk.workspace.TargetType)inttypes instead of strings:DEFAULT_STARTUP_TIMEOUT: 600 (int)DEFAULT_EVAL_WORKERS: 12 (int)Benefits
Finalannotations and appropriate types